Team Information

Member Name Member NetId
Pushpit Saxena pushpit2
Venslaus Prakash Arokiaraj vpa2

Please note that this is an interim report that we are submitting on coursera, as Prof. David has given us an extension till Sunday (08/09) to submit the final report (via. piazza). Please see here for details. (Final report will have more cleaner structure and some of the experimental models that you see in this will not be in final report. Please use that report for grading.)

Introduction

This project will mainly focus on studying different factors that play statistically significant role in influencing Life Expectancy. Some of the factors we will be focussing on are economic factors, social factors, health services factors (like immunizzation levels), mortality rate and various other health related factors. We will be building different multiple linear regression models and will try to apply some of concepts that we have learned as part of this course (STAT 420) to analyze and find the appropriate models for predicting life expectancy.

Dataset

Background

Based on the description of the dataset on kaggle, the Global Health Observatory(GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The datasets are made available to public for the purpose of health data analysis. This datset was collected from WHO and United Nations website and then the individual data files have been combined into a single data set (read more here)

Description

The dataset we will be using for this project is Life Expectancy data that can be found at Life Expectancy (WHO). The dataset has 22 variables and 2939 observations which needs some cleanup. (Note: we have also provided the dataset as part of the .zip [lifeExpectancyData] that we have uploaded for this proposal).

Following are some of the important variables used in this dataset:

  • Country (String): Country of observation

  • Year (Integer): Year of observation

  • Status (String): Whether the country of observation is developed or developing.

  • Life expectancy (Decimal): Life expectancy in age

  • Adult Mortality (Integer): Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)

  • Infant deaths (Integer): Number of Infant Deaths per 1000 population

  • Alcohol (Decimal): Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)

  • Percentage Expenditure (Decimal): Expenditure on health as a percentage of Gross Domestic Product per capita(%)

  • Hepatitis B (Int): Hepatitis B (HepB) immunization coverage among 1-year-olds (%)

  • Measles (Int): Measles - number of reported cases per 1000 population

  • BMI (Decimal): Average Body Mass Index of entire population

  • Under-five deaths (Int): Number of under-five deaths per 1000 population

  • Polio (Int): Polio (Pol3) immunization coverage among 1-year-olds (%)

  • Total expenditure (Decimal): General government expenditure on health as a percentage of total government expenditure (%)

  • Diphtheria (Int): Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)

  • HIV/AIDS (Decimal): Deaths per 1 000 live births HIV/AIDS (0-4 years)

  • GDP (Decimal): Gross Domestic Product per capita (in USD)

  • Population (Int): Population of the country

Data Source

We have attached the data file in the .zip [lifeExpectancyData] .

If needed, this dataset can also be downloaded from kaggle

Our Motivation

Data (Evidence)

## # A tibble: 2,938 x 22
##    Country  Year Status Life.expectancy Adult.Mortality infant.deaths Alcohol
##    <chr>   <int> <chr>            <dbl>           <int>         <int>   <dbl>
##  1 Afghan…  2015 Devel…            65               263            62    0.01
##  2 Afghan…  2014 Devel…            59.9             271            64    0.01
##  3 Afghan…  2013 Devel…            59.9             268            66    0.01
##  4 Afghan…  2012 Devel…            59.5             272            69    0.01
##  5 Afghan…  2011 Devel…            59.2             275            71    0.01
##  6 Afghan…  2010 Devel…            58.8             279            74    0.01
##  7 Afghan…  2009 Devel…            58.6             281            77    0.01
##  8 Afghan…  2008 Devel…            58.1             287            80    0.03
##  9 Afghan…  2007 Devel…            57.5             295            82    0.02
## 10 Afghan…  2006 Devel…            57.3             295            84    0.03
## # … with 2,928 more rows, and 15 more variables: percentage.expenditure <dbl>,
## #   Hepatitis.B <int>, Measles <int>, BMI <dbl>, under.five.deaths <int>,
## #   Polio <int>, Total.expenditure <dbl>, Diphtheria <int>, HIV.AIDS <dbl>,
## #   GDP <dbl>, Population <dbl>, thinness..1.19.years <dbl>,
## #   thinness.5.9.years <dbl>, Income.composition.of.resources <dbl>,
## #   Schooling <dbl>
##  [1] 65.0 59.9 59.9 59.5 59.2 58.8 58.6 58.1 57.5 57.3

Methods and Results

Data Cleaning:

Loading the Data:

Changing the names of the fields to follow a more consistent pattern(snake-case):

Snippet of the raw dataset:

## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 2,938 x 24
##    country year  status life_expectancy adult_mortality infant_deaths alcohol
##    <fct>   <fct> <fct>            <dbl>           <int>         <int>   <dbl>
##  1 Afghan… 2015  Devel…            65               263            62    0.01
##  2 Afghan… 2014  Devel…            59.9             271            64    0.01
##  3 Afghan… 2013  Devel…            59.9             268            66    0.01
##  4 Afghan… 2012  Devel…            59.5             272            69    0.01
##  5 Afghan… 2011  Devel…            59.2             275            71    0.01
##  6 Afghan… 2010  Devel…            58.8             279            74    0.01
##  7 Afghan… 2009  Devel…            58.6             281            77    0.01
##  8 Afghan… 2008  Devel…            58.1             287            80    0.03
##  9 Afghan… 2007  Devel…            57.5             295            82    0.02
## 10 Afghan… 2006  Devel…            57.3             295            84    0.03
## # … with 2,928 more rows, and 17 more variables: percentage_expenditure <dbl>,
## #   hepatitis_b <int>, measles <int>, bmi <dbl>, under_five_deaths <int>,
## #   polio <int>, total_expenditure <dbl>, diphtheria <int>, hiv_aids <dbl>,
## #   gdp <dbl>, population <dbl>, thinness_1_19_years <dbl>,
## #   thinness_5_9_years <dbl>, income_composition_of_resources <dbl>,
## #   schooling <dbl>, continent <fct>, region <fct>

Summary of numeric fields:

Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
life_expectancy 36.30 63.10 72.10 69.22 75.70 89.00 10.0
adult_mortality 1.00 74.00 144.00 164.80 228.00 723.00 10.0
infant_deaths 0.00 0.00 3.00 30.30 22.00 1800.00 0.0
alcohol 0.01 0.88 3.76 4.60 7.70 17.87 194.0
percentage_expenditure 0.00 4.69 64.91 738.25 441.53 19479.91 0.0
hepatitis_b 1.00 77.00 92.00 80.94 97.00 99.00 553.0
measles 0.00 0.00 17.00 2419.59 360.25 212183.00 0.0
bmi 1.00 19.30 43.50 38.32 56.20 87.30 34.0
under_five_deaths 0.00 0.00 4.00 42.04 28.00 2500.00 0.0
polio 3.00 78.00 93.00 82.55 97.00 99.00 19.0
total_expenditure 0.37 4.26 5.76 5.94 7.49 17.60 226.0
diphtheria 2.00 78.00 93.00 82.32 97.00 99.00 19.0
hiv_aids 0.10 0.10 0.10 1.74 0.80 50.60 0.1
gdp 1.68 463.94 1766.95 7483.16 5910.81 119172.74 448.0
population 34.00 195793.25 1386542.00 12753375.12 7420359.00 1293859294.00 652.0
thinness_1_19_years 0.10 1.60 3.30 4.84 7.20 27.70 34.0
thinness_5_9_years 0.10 1.50 3.30 4.87 7.20 28.60 34.0
income_composition_of_resources 0.00 0.49 0.68 0.63 0.78 0.95 167.0
schooling 0.00 10.10 12.30 11.99 14.30 20.70 163.0

We can see that only 10 observations have missing values for the response field life_expectancy, so we drop those 10 observations as dropping them will not make much difference to the models that we will try.

## [1] 2928

There are still 1279 observations with some missing values. We will use the mean of the value for a given country to impute some of these values:

## [1] 800

Still there are some observations with missing values. Next we will use the mean of the values for a given region in a particular year to impute some of these missing values:

## [1] 0

Finally, we have imputed all the values and our final dataset has 2928 observations

Data exploration and visualization

  • Basic statistics (by region)
Region #Records Avg. Life Expectancy Avg. Infant Deaths Avg. Adult Deaths
East Asia & Pacific 422 71.34231 25.265403 137.62260
Europe & Central Asia 770 75.95456 2.724675 109.26432
Latin America & Caribbean 498 73.07319 7.339357 135.32661
Middle East & North Africa 320 73.16312 11.281250 105.65625
North America 32 79.87500 14.093750 61.40625
South Asia 128 67.37422 250.039062 164.50781
Sub-Saharan Africa 768 57.08685 47.593750 283.07812
##   continent count mean_life_expectancy mean_infant_deaths mean_adult_deaths
## 1    Africa   864                57.80          44.246528         266.57176
## 2  Americas   530                73.90           7.747170         130.84659
## 3      Asia   752                72.55          60.875000         133.43750
## 4    Europe   626                77.80           1.172524          98.01282
## 5   Oceania   166                69.40           1.120482         135.08750

## Warning: Removed 10 rows containing non-finite values (stat_boxplot).

## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## Warning: package 'ggcorrplot' was built under R version 4.0.2
##                                 life_expectancy adult_mortality infant_deaths
## life_expectancy                             1.0            -0.7          -0.2
## adult_mortality                            -0.7             1.0           0.0
## infant_deaths                              -0.2             0.0           1.0
## alcohol                                     0.4            -0.2          -0.1
## percentage_expenditure                      0.4            -0.2          -0.1
## hepatitis_b                                 0.2            -0.1          -0.2
## measles                                    -0.1             0.0           0.5
## bmi                                         0.5            -0.4          -0.2
## under_five_deaths                          -0.2             0.1           1.0
## polio                                       0.3            -0.2          -0.2
## total_expenditure                           0.2            -0.1          -0.1
## diphtheria                                  0.3            -0.2          -0.2
## hiv_aids                                   -0.6             0.6           0.0
## gdp                                         0.4            -0.3          -0.1
## population                                  0.0             0.0           0.7
## thinness_1_19_years                        -0.5             0.3           0.5
## thinness_5_9_years                         -0.5             0.3           0.5
## income_composition_of_resources             0.7            -0.4          -0.1
## schooling                                   0.7            -0.4          -0.2
##                                 alcohol percentage_expenditure hepatitis_b
## life_expectancy                     0.4                    0.4         0.2
## adult_mortality                    -0.2                   -0.2        -0.1
## infant_deaths                      -0.1                   -0.1        -0.2
## alcohol                             1.0                    0.4         0.1
## percentage_expenditure              0.4                    1.0         0.0
## hepatitis_b                         0.1                    0.0         1.0
## measles                            -0.1                   -0.1        -0.1
## bmi                                 0.4                    0.2         0.1
## under_five_deaths                  -0.1                   -0.1        -0.2
## polio                               0.2                    0.1         0.5
## total_expenditure                   0.2                    0.2         0.1
## diphtheria                          0.2                    0.1         0.6
## hiv_aids                            0.0                   -0.1        -0.1
## gdp                                 0.4                    1.0         0.0
## population                          0.0                    0.0        -0.1
## thinness_1_19_years                -0.4                   -0.3        -0.1
## thinness_5_9_years                 -0.4                   -0.3        -0.1
## income_composition_of_resources     0.6                    0.4         0.2
## schooling                           0.6                    0.4         0.2
##                                 measles  bmi under_five_deaths polio
## life_expectancy                    -0.1  0.5              -0.2   0.3
## adult_mortality                     0.0 -0.4               0.1  -0.2
## infant_deaths                       0.5 -0.2               1.0  -0.2
## alcohol                            -0.1  0.4              -0.1   0.2
## percentage_expenditure             -0.1  0.2              -0.1   0.1
## hepatitis_b                        -0.1  0.1              -0.2   0.5
## measles                             1.0 -0.2               0.5  -0.1
## bmi                                -0.2  1.0              -0.2   0.2
## under_five_deaths                   0.5 -0.2               1.0  -0.2
## polio                              -0.1  0.2              -0.2   1.0
## total_expenditure                  -0.1  0.2              -0.1   0.1
## diphtheria                         -0.1  0.2              -0.2   0.6
## hiv_aids                            0.0 -0.2               0.0  -0.1
## gdp                                -0.1  0.3              -0.1   0.2
## population                          0.3 -0.1               0.7   0.0
## thinness_1_19_years                 0.2 -0.5               0.5  -0.2
## thinness_5_9_years                  0.2 -0.6               0.5  -0.2
## income_composition_of_resources    -0.1  0.5              -0.1   0.3
## schooling                          -0.1  0.6              -0.2   0.4
##                                 total_expenditure diphtheria hiv_aids  gdp
## life_expectancy                               0.2        0.3     -0.6  0.4
## adult_mortality                              -0.1       -0.2      0.6 -0.3
## infant_deaths                                -0.1       -0.2      0.0 -0.1
## alcohol                                       0.2        0.2      0.0  0.4
## percentage_expenditure                        0.2        0.1     -0.1  1.0
## hepatitis_b                                   0.1        0.6     -0.1  0.0
## measles                                      -0.1       -0.1      0.0 -0.1
## bmi                                           0.2        0.2     -0.2  0.3
## under_five_deaths                            -0.1       -0.2      0.0 -0.1
## polio                                         0.1        0.6     -0.1  0.2
## total_expenditure                             1.0        0.1      0.0  0.2
## diphtheria                                    0.1        1.0     -0.1  0.2
## hiv_aids                                      0.0       -0.1      1.0 -0.1
## gdp                                           0.2        0.2     -0.1  1.0
## population                                   -0.1        0.0      0.0  0.0
## thinness_1_19_years                          -0.2       -0.2      0.2 -0.3
## thinness_5_9_years                           -0.2       -0.2      0.2 -0.3
## income_composition_of_resources               0.2        0.3     -0.2  0.4
## schooling                                     0.2        0.4     -0.2  0.5
##                                 population thinness_1_19_years
## life_expectancy                        0.0                -0.5
## adult_mortality                        0.0                 0.3
## infant_deaths                          0.7                 0.5
## alcohol                                0.0                -0.4
## percentage_expenditure                 0.0                -0.3
## hepatitis_b                           -0.1                -0.1
## measles                                0.3                 0.2
## bmi                                   -0.1                -0.5
## under_five_deaths                      0.7                 0.5
## polio                                  0.0                -0.2
## total_expenditure                     -0.1                -0.2
## diphtheria                             0.0                -0.2
## hiv_aids                               0.0                 0.2
## gdp                                    0.0                -0.3
## population                             1.0                 0.3
## thinness_1_19_years                    0.3                 1.0
## thinness_5_9_years                     0.3                 0.9
## income_composition_of_resources        0.0                -0.5
## schooling                              0.0                -0.5
##                                 thinness_5_9_years
## life_expectancy                               -0.5
## adult_mortality                                0.3
## infant_deaths                                  0.5
## alcohol                                       -0.4
## percentage_expenditure                        -0.3
## hepatitis_b                                   -0.1
## measles                                        0.2
## bmi                                           -0.6
## under_five_deaths                              0.5
## polio                                         -0.2
## total_expenditure                             -0.2
## diphtheria                                    -0.2
## hiv_aids                                       0.2
## gdp                                           -0.3
## population                                     0.3
## thinness_1_19_years                            0.9
## thinness_5_9_years                             1.0
## income_composition_of_resources               -0.4
## schooling                                     -0.5
##                                 income_composition_of_resources schooling
## life_expectancy                                             0.7       0.7
## adult_mortality                                            -0.4      -0.4
## infant_deaths                                              -0.1      -0.2
## alcohol                                                     0.6       0.6
## percentage_expenditure                                      0.4       0.4
## hepatitis_b                                                 0.2       0.2
## measles                                                    -0.1      -0.1
## bmi                                                         0.5       0.6
## under_five_deaths                                          -0.1      -0.2
## polio                                                       0.3       0.4
## total_expenditure                                           0.2       0.2
## diphtheria                                                  0.3       0.4
## hiv_aids                                                   -0.2      -0.2
## gdp                                                         0.4       0.5
## population                                                  0.0       0.0
## thinness_1_19_years                                        -0.5      -0.5
## thinness_5_9_years                                         -0.4      -0.5
## income_composition_of_resources                             1.0       0.8
## schooling                                                   0.8       1.0


Model Building

Splitting the data in training and test set (90% training, 10% hold out test set):

Ignoring all the categorical variables for now (except status, we have fitted models using some of these categorical variables but couldn’t get better results, code can be seen in Appendix)

We started with fitting a full Additive model (with all the numerical predictor and status). This will provide us with a good baseline model to do simple as well as more nuanced feature selections later

  • Summary of the full additive model
## 
## Call:
## lm(formula = life_expectancy ~ ., data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6482  -2.2808  -0.1263   2.2784  17.4919 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.521e+01  6.664e-01  82.838  < 2e-16 ***
## statusDeveloping                -1.243e+00  2.834e-01  -4.386 1.20e-05 ***
## adult_mortality                 -1.818e-02  8.274e-04 -21.975  < 2e-16 ***
## infant_deaths                    9.078e-02  8.753e-03  10.371  < 2e-16 ***
## alcohol                          3.047e-02  2.696e-02   1.130   0.2585    
## percentage_expenditure           1.562e-04  7.757e-05   2.014   0.0441 *  
## hepatitis_b                     -2.051e-03  4.082e-03  -0.502   0.6155    
## measles                         -1.497e-05  7.809e-06  -1.917   0.0553 .  
## bmi                              3.719e-02  5.170e-03   7.192 8.27e-13 ***
## under_five_deaths               -6.775e-02  6.406e-03 -10.575  < 2e-16 ***
## polio                            2.547e-02  4.731e-03   5.384 7.92e-08 ***
## total_expenditure                1.127e-02  3.447e-02   0.327   0.7437    
## diphtheria                       3.305e-02  5.048e-03   6.547 7.04e-11 ***
## hiv_aids                        -4.746e-01  1.777e-02 -26.709  < 2e-16 ***
## gdp                              2.965e-05  1.195e-05   2.481   0.0132 *  
## population                       6.525e-10  1.900e-09   0.343   0.7313    
## thinness_1_19_years             -7.204e-02  4.970e-02  -1.449   0.1473    
## thinness_5_9_years              -7.721e-03  4.899e-02  -0.158   0.8748    
## income_composition_of_resources  6.297e+00  6.552e-01   9.611  < 2e-16 ***
## schooling                        7.378e-01  4.511e-02  16.355  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.959 on 2615 degrees of freedom
## Multiple R-squared:  0.828,  Adjusted R-squared:  0.8267 
## F-statistic: 662.3 on 19 and 2615 DF,  p-value: < 2.2e-16
  • Diagnostics plots for full additive model

  • Test \(\mathbf{\text{RMSE}} = 3.7453062\)
  • \(R^2 = 0.8279511\)
  • We can see that this model show that there are some non-significant predictors in the full model, for e.g. for alcohol, if we use t-test for significance:
    • Null: \(H_0:\) \(\beta_{alcohol} = 0\)
    • Alternative: \(H_0:\) \(\beta_{alcohol} \neq 0\)
    • Test Statistics: \(1.1301202\)
    • P-value: \(0.2585292\)
    • Decision: Fail to reject null
    • Conclusion: alcohol does not have significant linear relationship with life_expectancy
  • Also, we can see from the diagnostic plots that both equal variance assumption and normality assumption are suspect.

So we started with simple (not recommended) method of removing some of the least significant predictors. Also, there seems to be high collinearity between infant_deaths and under_5_deaths (check vif below and correlation plot shown earlier).

##                          status                 adult_mortality 
##                        1.949467                        1.756053 
##                   infant_deaths                         alcohol 
##                      165.233443                        1.982791 
##          percentage_expenditure                     hepatitis_b 
##                        4.085965                        1.691643 
##                         measles                             bmi 
##                        1.372574                        1.795997 
##               under_five_deaths                           polio 
##                      165.237301                        2.038419 
##               total_expenditure                      diphtheria 
##                        1.202812                        2.389284 
##                        hiv_aids                             gdp 
##                        1.396273                        4.412362 
##                      population             thinness_1_19_years 
##                        1.555639                        8.034095 
##              thinness_5_9_years income_composition_of_resources 
##                        8.115884                        3.156427 
##                       schooling 
##                        3.775685

So we removed some of the least significant predictor and kept infant_deaths

## 
## Call:
## lm(formula = life_expectancy ~ adult_mortality + infant_deaths + 
##     bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources * 
##     status + schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.9816  -2.2703  -0.0947   2.4075  18.9792 
## 
## Coefficients:
##                                                    Estimate Std. Error t value
## (Intercept)                                       4.604e+01  3.066e+00  15.017
## adult_mortality                                  -1.860e-02  8.439e-04 -22.047
## infant_deaths                                    -2.677e-03  7.299e-04  -3.667
## bmi                                               4.484e-02  4.994e-03   8.979
## diphtheria                                        5.473e-02  3.811e-03  14.360
## hiv_aids                                         -4.908e-01  1.806e-02 -27.176
## gdp                                               4.063e-05  7.465e-06   5.442
## income_composition_of_resources                   1.669e+01  3.706e+00   4.505
## statusDeveloping                                  6.624e+00  3.040e+00   2.179
## schooling                                         7.792e-01  4.482e-02  17.385
## income_composition_of_resources:statusDeveloping -9.567e+00  3.631e+00  -2.635
##                                                  Pr(>|t|)    
## (Intercept)                                       < 2e-16 ***
## adult_mortality                                   < 2e-16 ***
## infant_deaths                                     0.00025 ***
## bmi                                               < 2e-16 ***
## diphtheria                                        < 2e-16 ***
## hiv_aids                                          < 2e-16 ***
## gdp                                              5.74e-08 ***
## income_composition_of_resources                  6.94e-06 ***
## statusDeveloping                                  0.02944 *  
## schooling                                         < 2e-16 ***
## income_composition_of_resources:statusDeveloping  0.00846 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.076 on 2624 degrees of freedom
## Multiple R-squared:  0.817,  Adjusted R-squared:  0.8163 
## F-statistic:  1171 on 10 and 2624 DF,  p-value: < 2.2e-16
  • Diagnostic plot:

  • Test \(\mathbf{\text{RMSE}} = 3.9150261\)
  • \(R^2 = 0.8170003\)
## 
## Call:
## lm(formula = life_expectancy ~ (adult_mortality + under_five_deaths + 
##     bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources + 
##     schooling)^2, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.8667  -2.0800  -0.0783   2.0935  14.9944 
## 
## Coefficients:
##                                                     Estimate Std. Error t value
## (Intercept)                                        4.560e+01  1.566e+00  29.119
## adult_mortality                                   -3.429e-03  3.164e-03  -1.084
## under_five_deaths                                  9.951e-03  4.195e-03   2.372
## bmi                                                3.534e-01  3.022e-02  11.692
## diphtheria                                         1.465e-01  1.677e-02   8.733
## hiv_aids                                          -1.186e+00  1.378e-01  -8.607
## gdp                                                4.622e-04  9.661e-05   4.784
## income_composition_of_resources                   -1.458e+01  2.907e+00  -5.016
## schooling                                          1.338e+00  1.875e-01   7.137
## adult_mortality:under_five_deaths                 -1.128e-06  5.387e-06  -0.209
## adult_mortality:bmi                                3.349e-05  5.962e-05   0.562
## adult_mortality:diphtheria                        -9.966e-05  3.239e-05  -3.076
## adult_mortality:hiv_aids                           8.373e-04  6.911e-05  12.115
## adult_mortality:gdp                               -5.788e-08  1.717e-07  -0.337
## adult_mortality:income_composition_of_resources   -7.377e-03  7.182e-03  -1.027
## adult_mortality:schooling                         -8.181e-04  4.551e-04  -1.798
## under_five_deaths:bmi                             -3.054e-04  1.111e-04  -2.749
## under_five_deaths:diphtheria                      -1.373e-05  2.680e-05  -0.512
## under_five_deaths:hiv_aids                        -6.917e-04  3.445e-04  -2.008
## under_five_deaths:gdp                             -2.036e-07  5.638e-07  -0.361
## under_five_deaths:income_composition_of_resources  1.716e-02  6.180e-03   2.777
## under_five_deaths:schooling                       -1.339e-03  5.273e-04  -2.539
## bmi:diphtheria                                    -1.487e-03  2.222e-04  -6.691
## bmi:hiv_aids                                       4.319e-03  1.999e-03   2.161
## bmi:gdp                                           -2.412e-07  3.413e-07  -0.707
## bmi:income_composition_of_resources                1.020e-02  3.377e-02   0.302
## bmi:schooling                                     -1.633e-02  2.264e-03  -7.214
## diphtheria:hiv_aids                               -9.062e-04  9.076e-04  -0.999
## diphtheria:gdp                                     2.030e-07  5.526e-07   0.367
## diphtheria:income_composition_of_resources         6.137e-02  2.334e-02   2.630
## diphtheria:schooling                              -6.664e-03  1.693e-03  -3.936
## hiv_aids:gdp                                      -3.389e-05  9.961e-06  -3.402
## hiv_aids:income_composition_of_resources           2.017e+00  3.418e-01   5.901
## hiv_aids:schooling                                -5.147e-02  1.627e-02  -3.164
## gdp:income_composition_of_resources               -4.159e-04  1.052e-04  -3.954
## gdp:schooling                                     -3.872e-06  3.694e-06  -1.048
## income_composition_of_resources:schooling          1.435e+00  1.112e-01  12.909
##                                                   Pr(>|t|)    
## (Intercept)                                        < 2e-16 ***
## adult_mortality                                   0.278493    
## under_five_deaths                                 0.017768 *  
## bmi                                                < 2e-16 ***
## diphtheria                                         < 2e-16 ***
## hiv_aids                                           < 2e-16 ***
## gdp                                               1.81e-06 ***
## income_composition_of_resources                   5.64e-07 ***
## schooling                                         1.23e-12 ***
## adult_mortality:under_five_deaths                 0.834099    
## adult_mortality:bmi                               0.574404    
## adult_mortality:diphtheria                        0.002117 ** 
## adult_mortality:hiv_aids                           < 2e-16 ***
## adult_mortality:gdp                               0.736077    
## adult_mortality:income_composition_of_resources   0.304431    
## adult_mortality:schooling                         0.072369 .  
## under_five_deaths:bmi                             0.006011 ** 
## under_five_deaths:diphtheria                      0.608422    
## under_five_deaths:hiv_aids                        0.044778 *  
## under_five_deaths:gdp                             0.718055    
## under_five_deaths:income_composition_of_resources 0.005521 ** 
## under_five_deaths:schooling                       0.011162 *  
## bmi:diphtheria                                    2.70e-11 ***
## bmi:hiv_aids                                      0.030771 *  
## bmi:gdp                                           0.479841    
## bmi:income_composition_of_resources               0.762627    
## bmi:schooling                                     7.12e-13 ***
## diphtheria:hiv_aids                               0.318122    
## diphtheria:gdp                                    0.713441    
## diphtheria:income_composition_of_resources        0.008597 ** 
## diphtheria:schooling                              8.49e-05 ***
## hiv_aids:gdp                                      0.000679 ***
## hiv_aids:income_composition_of_resources          4.08e-09 ***
## hiv_aids:schooling                                0.001573 ** 
## gdp:income_composition_of_resources               7.88e-05 ***
## gdp:schooling                                     0.294661    
## income_composition_of_resources:schooling          < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.564 on 2598 degrees of freedom
## Multiple R-squared:  0.8615, Adjusted R-squared:  0.8596 
## F-statistic: 448.9 on 36 and 2598 DF,  p-value: < 2.2e-16

## [1] 3.362962

## [1] 4.766326

## 
## Call:
## lm(formula = life_expectancy^3 ~ income_composition_of_resources + 
##     adult_mortality + bmi + status + under_five_deaths, data = non_cat_predictor_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -264191  -33656   -3659   31225  340598 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     281961.262   7959.723  35.424  < 2e-16 ***
## income_composition_of_resources 235398.426   8317.241  28.302  < 2e-16 ***
## adult_mortality                   -388.140     12.330 -31.478  < 2e-16 ***
## bmi                                989.076     79.225  12.484  < 2e-16 ***
## statusDeveloping                -61380.933   3979.585 -15.424  < 2e-16 ***
## under_five_deaths                  -53.334      8.747  -6.097 1.24e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 67240 on 2629 degrees of freedom
## Multiple R-squared:  0.7294, Adjusted R-squared:  0.7289 
## F-statistic:  1417 on 5 and 2629 DF,  p-value: < 2.2e-16

## [1] 363332.2
## 
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + alcohol + gdp + 
##     hiv_aids + diphtheria + status, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.9508  -2.8516  -0.0173   2.8691  21.4125 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.884e+01  5.877e-01  83.102  < 2e-16 ***
## schooling         1.275e+00  4.028e-02  31.649  < 2e-16 ***
## bmi               6.391e-02  5.458e-03  11.709  < 2e-16 ***
## alcohol          -4.988e-02  2.959e-02  -1.686    0.092 .  
## gdp               6.939e-05  7.751e-06   8.953  < 2e-16 ***
## hiv_aids         -6.794e-01  1.816e-02 -37.403  < 2e-16 ***
## diphtheria        6.523e-02  4.186e-03  15.585  < 2e-16 ***
## statusDeveloping -2.146e+00  3.201e-01  -6.703 2.49e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.559 on 2627 degrees of freedom
## Multiple R-squared:  0.7708, Adjusted R-squared:  0.7702 
## F-statistic:  1262 on 7 and 2627 DF,  p-value: < 2.2e-16

## [1] 4.39782
## Start:  AIC=8003.26
## life_expectancy ~ schooling + bmi + alcohol + gdp + hiv_aids + 
##     diphtheria + status
## 
##              Df Sum of Sq   RSS    AIC   F value    Pr(>F)    
## <none>                    54604 8003.3                        
## - alcohol     1      59.1 54663 8004.1    2.8412     0.092 .  
## - status      1     933.9 55538 8045.9   44.9326 2.487e-11 ***
## - gdp         1    1666.1 56270 8080.5   80.1564 < 2.2e-16 ***
## - bmi         1    2849.8 57453 8135.3  137.1063 < 2.2e-16 ***
## - diphtheria  1    5048.8 59652 8234.3  242.9016 < 2.2e-16 ***
## - schooling   1   20820.0 75424 8852.4 1001.6563 < 2.2e-16 ***
## - hiv_aids    1   29079.2 83683 9126.2 1399.0111 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + alcohol + gdp + 
##     hiv_aids + diphtheria + status, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.9508  -2.8516  -0.0173   2.8691  21.4125 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.884e+01  5.877e-01  83.102  < 2e-16 ***
## schooling         1.275e+00  4.028e-02  31.649  < 2e-16 ***
## bmi               6.391e-02  5.458e-03  11.709  < 2e-16 ***
## alcohol          -4.988e-02  2.959e-02  -1.686    0.092 .  
## gdp               6.939e-05  7.751e-06   8.953  < 2e-16 ***
## hiv_aids         -6.794e-01  1.816e-02 -37.403  < 2e-16 ***
## diphtheria        6.523e-02  4.186e-03  15.585  < 2e-16 ***
## statusDeveloping -2.146e+00  3.201e-01  -6.703 2.49e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.559 on 2627 degrees of freedom
## Multiple R-squared:  0.7708, Adjusted R-squared:  0.7702 
## F-statistic:  1262 on 7 and 2627 DF,  p-value: < 2.2e-16

## [1] 4.39782
## 
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + gdp * status + 
##     hiv_aids + diphtheria, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.9932  -2.9250   0.0812   2.9494  20.7958 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.972e+01  6.390e-01  77.815  < 2e-16 ***
## schooling             1.234e+00  3.892e-02  31.715  < 2e-16 ***
## bmi                   6.029e-02  5.495e-03  10.972  < 2e-16 ***
## gdp                   4.706e-05  9.683e-06   4.860 1.24e-06 ***
## statusDeveloping     -2.742e+00  3.593e-01  -7.631 3.23e-14 ***
## hiv_aids             -6.809e-01  1.797e-02 -37.887  < 2e-16 ***
## diphtheria            6.451e-02  4.181e-03  15.430  < 2e-16 ***
## gdp:statusDeveloping  6.133e-05  1.588e-05   3.862 0.000115 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.549 on 2627 degrees of freedom
## Multiple R-squared:  0.7718, Adjusted R-squared:  0.7712 
## F-statistic:  1269 on 7 and 2627 DF,  p-value: < 2.2e-16

## [1] 4.370292

## 
## Call:
## lm(formula = life_expectancy ~ (schooling + bmi + gdp + hiv_aids + 
##     diphtheria) * status, data = le_trn_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.2444  -2.7806   0.0821   2.7248  21.0926 
## 
## Coefficients: (1 not defined because of singularities)
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  6.400e+01  2.818e+00  22.708  < 2e-16 ***
## schooling                    8.323e-01  1.284e-01   6.483 1.07e-10 ***
## bmi                         -1.775e-02  1.238e-02  -1.434  0.15155    
## gdp                          5.041e-05  9.706e-06   5.193 2.23e-07 ***
## hiv_aids                    -6.709e-01  1.778e-02 -37.737  < 2e-16 ***
## diphtheria                   2.186e-02  1.729e-02   1.264  0.20636    
## statusDeveloping            -1.745e+01  2.852e+00  -6.118 1.09e-09 ***
## schooling:statusDeveloping   3.814e-01  1.349e-01   2.828  0.00472 ** 
## bmi:statusDeveloping         9.622e-02  1.378e-02   6.984 3.63e-12 ***
## gdp:statusDeveloping         4.708e-05  1.590e-05   2.962  0.00309 ** 
## hiv_aids:statusDeveloping           NA         NA      NA       NA    
## diphtheria:statusDeveloping  4.321e-02  1.782e-02   2.425  0.01539 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.489 on 2624 degrees of freedom
## Multiple R-squared:  0.778,  Adjusted R-squared:  0.7772 
## F-statistic: 919.6 on 10 and 2624 DF,  p-value: < 2.2e-16

## Start:  AIC=7924.86
## life_expectancy ~ (schooling + bmi + gdp + hiv_aids + diphtheria) * 
##     status
## 
## 
## Step:  AIC=7924.86
## life_expectancy ~ schooling + bmi + gdp + hiv_aids + diphtheria + 
##     status + schooling:status + bmi:status + gdp:status + diphtheria:status
## 
##                     Df Sum of Sq   RSS    AIC
## <none>                           52882 7924.9
## - diphtheria:status  1     118.5 53001 7928.8
## - schooling:status   1     161.2 53044 7930.9
## - gdp:status         1     176.8 53059 7931.6
## - bmi:status         1     982.9 53865 7971.4
## - hiv_aids           1   28699.7 81582 9065.2
## 
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + gdp + hiv_aids + 
##     diphtheria + status + schooling:status + bmi:status + gdp:status + 
##     diphtheria:status, data = le_trn_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.2444  -2.7806   0.0821   2.7248  21.0926 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  6.400e+01  2.818e+00  22.708  < 2e-16 ***
## schooling                    8.323e-01  1.284e-01   6.483 1.07e-10 ***
## bmi                         -1.775e-02  1.238e-02  -1.434  0.15155    
## gdp                          5.041e-05  9.706e-06   5.193 2.23e-07 ***
## hiv_aids                    -6.709e-01  1.778e-02 -37.737  < 2e-16 ***
## diphtheria                   2.186e-02  1.729e-02   1.264  0.20636    
## statusDeveloping            -1.745e+01  2.852e+00  -6.118 1.09e-09 ***
## schooling:statusDeveloping   3.814e-01  1.349e-01   2.828  0.00472 ** 
## bmi:statusDeveloping         9.622e-02  1.378e-02   6.984 3.63e-12 ***
## gdp:statusDeveloping         4.708e-05  1.590e-05   2.962  0.00309 ** 
## diphtheria:statusDeveloping  4.321e-02  1.782e-02   2.425  0.01539 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.489 on 2624 degrees of freedom
## Multiple R-squared:  0.778,  Adjusted R-squared:  0.7772 
## F-statistic: 919.6 on 10 and 2624 DF,  p-value: < 2.2e-16

## Warning: package 'leaps' was built under R version 4.0.2

## 
## Call:
## lm(formula = life_expectancy ~ adult_mortality + bmi + hiv_aids + 
##     income_composition_of_resources + schooling, data = le_trn_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.5054  -2.1955  -0.1338   2.2222  23.3580 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     53.8491547  0.4361749 123.458   <2e-16 ***
## adult_mortality                 -0.0201389  0.0008887 -22.662   <2e-16 ***
## bmi                              0.0504595  0.0052085   9.688   <2e-16 ***
## hiv_aids                        -0.4867176  0.0191328 -25.439   <2e-16 ***
## income_composition_of_resources  9.0596858  0.6923758  13.085   <2e-16 ***
## schooling                        0.9977424  0.0451093  22.118   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.322 on 2629 degrees of freedom
## Multiple R-squared:  0.7938, Adjusted R-squared:  0.7934 
## F-statistic:  2024 on 5 and 2629 DF,  p-value: < 2.2e-16

## [1] 2624

## 
## Call:
## lm(formula = life_expectancy ~ adult_mortality + bmi + hiv_aids + 
##     income_composition_of_resources + schooling, data = life_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1504  -2.2166  -0.1487   2.2184  23.1675 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     53.9791795  0.4307309 125.320   <2e-16 ***
## adult_mortality                 -0.0197288  0.0008879 -22.219   <2e-16 ***
## bmi                              0.0504382  0.0051161   9.859   <2e-16 ***
## hiv_aids                        -0.4920254  0.0190763 -25.792   <2e-16 ***
## income_composition_of_resources  8.8965947  0.6799516  13.084   <2e-16 ***
## schooling                        0.9944462  0.0442962  22.450   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.242 on 2618 degrees of freedom
## Multiple R-squared:  0.7952, Adjusted R-squared:  0.7948 
## F-statistic:  2033 on 5 and 2618 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = life_expectancy ~ adult_mortality + under_five_deaths + 
##     bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources + 
##     schooling + status, data = life_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.0050  -2.3594  -0.1452   2.3826  18.8654 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.434e+01  5.799e-01  93.714  < 2e-16 ***
## adult_mortality                 -1.819e-02  8.404e-04 -21.643  < 2e-16 ***
## under_five_deaths               -2.498e-03  5.253e-04  -4.754 2.10e-06 ***
## bmi                              4.283e-02  4.863e-03   8.807  < 2e-16 ***
## diphtheria                       5.252e-02  3.717e-03  14.131  < 2e-16 ***
## hiv_aids                        -4.950e-01  1.794e-02 -27.587  < 2e-16 ***
## gdp                              4.778e-05  6.855e-06   6.970 4.00e-12 ***
## income_composition_of_resources  6.912e+00  6.522e-01  10.598  < 2e-16 ***
## schooling                        7.872e-01  4.347e-02  18.109  < 2e-16 ***
## statusDeveloping                -1.412e+00  2.538e-01  -5.563 2.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.987 on 2614 degrees of freedom
## Multiple R-squared:  0.8193, Adjusted R-squared:  0.8187 
## F-statistic:  1317 on 9 and 2614 DF,  p-value: < 2.2e-16

## Start:  AIC=7118.86
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - thinness_5_9_years               1       0.2 38956 7116.9
## - hepatitis_b                      1       0.5 38957 7116.9
## - population                       1       2.6 38959 7117.0
## - alcohol                          1      26.2 38982 7118.6
## <none>                                         38956 7118.9
## - thinness_1_19_years              1      34.7 38991 7119.2
## - total_expenditure                1      39.9 38996 7119.5
## - measles                          1      60.9 39017 7121.0
## - percentage_expenditure           1      66.0 39022 7121.3
## - gdp                              1      97.6 39054 7123.4
## - status                           1     283.0 39239 7135.9
## - polio                            1     432.3 39388 7145.8
## - diphtheria                       1     625.6 39582 7158.7
## - bmi                              1     775.8 39732 7168.6
## - income_composition_of_resources  1    1407.5 40364 7210.0
## - infant_deaths                    1    1640.7 40597 7225.1
## - under_five_deaths                1    1708.9 40665 7229.5
## - schooling                        1    4021.5 42978 7374.6
## - adult_mortality                  1    6958.3 45914 7548.1
## - hiv_aids                         1   11167.2 50123 7778.2
## 
## Step:  AIC=7116.87
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - hepatitis_b                      1       0.5 38957 7114.9
## - population                       1       2.6 38959 7115.0
## - alcohol                          1      26.3 38983 7116.6
## <none>                                         38956 7116.9
## - total_expenditure                1      40.3 38997 7117.6
## + thinness_5_9_years               1       0.2 38956 7118.9
## - measles                          1      60.8 39017 7119.0
## - percentage_expenditure           1      66.0 39022 7119.3
## - gdp                              1      97.7 39054 7121.4
## - thinness_1_19_years              1     158.6 39115 7125.5
## - status                           1     283.4 39240 7133.9
## - polio                            1     432.8 39389 7143.9
## - diphtheria                       1     625.5 39582 7156.7
## - bmi                              1     790.2 39746 7167.6
## - income_composition_of_resources  1    1407.3 40364 7208.0
## - infant_deaths                    1    1644.5 40601 7223.4
## - under_five_deaths                1    1710.8 40667 7227.6
## - schooling                        1    4022.3 42979 7372.7
## - adult_mortality                  1    6968.0 45924 7546.7
## - hiv_aids                         1   11181.9 50138 7777.0
## 
## Step:  AIC=7114.9
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - population                       1       2.7 38959 7113.1
## - alcohol                          1      27.0 38984 7114.7
## <none>                                         38957 7114.9
## - total_expenditure                1      40.1 38997 7115.6
## + hepatitis_b                      1       0.5 38956 7116.9
## + thinness_5_9_years               1       0.2 38957 7116.9
## - measles                          1      60.7 39018 7117.0
## - percentage_expenditure           1      67.4 39024 7117.4
## - gdp                              1      97.3 39054 7119.4
## - thinness_1_19_years              1     160.3 39117 7123.7
## - status                           1     283.0 39240 7131.9
## - polio                            1     439.5 39396 7142.3
## - diphtheria                       1     722.4 39679 7161.1
## - bmi                              1     789.9 39747 7165.6
## - income_composition_of_resources  1    1410.0 40367 7206.2
## - infant_deaths                    1    1651.6 40608 7221.9
## - under_five_deaths                1    1716.1 40673 7226.0
## - schooling                        1    4036.0 42993 7371.6
## - adult_mortality                  1    6969.2 45926 7544.8
## - hiv_aids                         1   11193.2 50150 7775.6
## 
## Step:  AIC=7113.09
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - alcohol                          1      27.2 38987 7112.9
## <none>                                         38959 7113.1
## - total_expenditure                1      39.7 38999 7113.8
## + population                       1       2.7 38957 7114.9
## + hepatitis_b                      1       0.6 38959 7115.0
## + thinness_5_9_years               1       0.2 38959 7115.1
## - measles                          1      63.5 39023 7115.4
## - percentage_expenditure           1      67.3 39027 7115.6
## - gdp                              1      97.5 39057 7117.6
## - thinness_1_19_years              1     160.6 39120 7121.9
## - status                           1     281.8 39241 7130.0
## - polio                            1     438.6 39398 7140.5
## - diphtheria                       1     725.7 39685 7159.5
## - bmi                              1     791.4 39751 7163.9
## - income_composition_of_resources  1    1409.6 40369 7204.3
## - infant_deaths                    1    1713.3 40673 7224.0
## - under_five_deaths                1    1744.0 40703 7226.0
## - schooling                        1    4048.6 43008 7370.5
## - adult_mortality                  1    6976.5 45936 7543.3
## - hiv_aids                         1   11193.5 50153 7773.8
## 
## Step:  AIC=7112.92
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## <none>                                         38987 7112.9
## + alcohol                          1      27.2 38959 7113.1
## - total_expenditure                1      45.6 39032 7114.0
## + population                       1       3.0 38984 7114.7
## + hepatitis_b                      1       1.3 38985 7114.8
## + thinness_5_9_years               1       0.4 38986 7114.9
## - measles                          1      61.9 39049 7115.1
## - percentage_expenditure           1      71.4 39058 7115.7
## - gdp                              1      93.2 39080 7117.2
## - thinness_1_19_years              1     197.5 39184 7124.2
## - status                           1     416.7 39403 7138.8
## - polio                            1     443.1 39430 7140.6
## - diphtheria                       1     729.0 39716 7159.5
## - bmi                              1     790.0 39777 7163.6
## - income_composition_of_resources  1    1415.8 40403 7204.5
## - infant_deaths                    1    1686.2 40673 7222.0
## - under_five_deaths                1    1717.2 40704 7224.0
## - schooling                        1    4406.1 43393 7391.9
## - adult_mortality                  1    6954.9 45942 7541.7
## - hiv_aids                         1   11179.4 50166 7772.5
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling, 
##     data = subset(life_clean, select = -c(year, country, continent, 
##         region)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.5984  -2.3398  -0.1235   2.2619  17.5405 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.524e+01  6.431e-01  85.891  < 2e-16 ***
## statusDeveloping                -1.346e+00  2.550e-01  -5.280 1.40e-07 ***
## adult_mortality                 -1.766e-02  8.189e-04 -21.570  < 2e-16 ***
## infant_deaths                    8.870e-02  8.351e-03  10.621  < 2e-16 ***
## percentage_expenditure           1.649e-04  7.544e-05   2.186 0.028931 *  
## measles                         -1.548e-05  7.605e-06  -2.036 0.041890 *  
## bmi                              3.646e-02  5.016e-03   7.270 4.74e-13 ***
## under_five_deaths               -6.602e-02  6.160e-03 -10.718  < 2e-16 ***
## polio                            2.482e-02  4.559e-03   5.445 5.68e-08 ***
## total_expenditure                5.908e-02  3.383e-02   1.746 0.080870 .  
## diphtheria                       3.168e-02  4.536e-03   6.983 3.64e-12 ***
## hiv_aids                        -4.802e-01  1.756e-02 -27.347  < 2e-16 ***
## gdp                              2.909e-05  1.165e-05   2.496 0.012609 *  
## thinness_1_19_years             -8.538e-02  2.349e-02  -3.635 0.000283 ***
## income_composition_of_resources  6.227e+00  6.399e-01   9.732  < 2e-16 ***
## schooling                        7.365e-01  4.290e-02  17.168  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.866 on 2608 degrees of freedom
## Multiple R-squared:  0.8305, Adjusted R-squared:  0.8295 
## F-statistic: 851.7 on 15 and 2608 DF,  p-value: < 2.2e-16

## Start:  AIC=7118.86
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - thinness_5_9_years               1       0.2 38956 7116.9
## - hepatitis_b                      1       0.5 38957 7116.9
## - population                       1       2.6 38959 7117.0
## - alcohol                          1      26.2 38982 7118.6
## <none>                                         38956 7118.9
## - thinness_1_19_years              1      34.7 38991 7119.2
## - total_expenditure                1      39.9 38996 7119.5
## - measles                          1      60.9 39017 7121.0
## - percentage_expenditure           1      66.0 39022 7121.3
## - gdp                              1      97.6 39054 7123.4
## - status                           1     283.0 39239 7135.9
## - polio                            1     432.3 39388 7145.8
## - diphtheria                       1     625.6 39582 7158.7
## - bmi                              1     775.8 39732 7168.6
## - income_composition_of_resources  1    1407.5 40364 7210.0
## - infant_deaths                    1    1640.7 40597 7225.1
## - under_five_deaths                1    1708.9 40665 7229.5
## - schooling                        1    4021.5 42978 7374.6
## - adult_mortality                  1    6958.3 45914 7548.1
## - hiv_aids                         1   11167.2 50123 7778.2
## 
## Step:  AIC=7116.87
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - hepatitis_b                      1       0.5 38957 7114.9
## - population                       1       2.6 38959 7115.0
## - alcohol                          1      26.3 38983 7116.6
## <none>                                         38956 7116.9
## - total_expenditure                1      40.3 38997 7117.6
## - measles                          1      60.8 39017 7119.0
## - percentage_expenditure           1      66.0 39022 7119.3
## - gdp                              1      97.7 39054 7121.4
## - thinness_1_19_years              1     158.6 39115 7125.5
## - status                           1     283.4 39240 7133.9
## - polio                            1     432.8 39389 7143.9
## - diphtheria                       1     625.5 39582 7156.7
## - bmi                              1     790.2 39746 7167.6
## - income_composition_of_resources  1    1407.3 40364 7208.0
## - infant_deaths                    1    1644.5 40601 7223.4
## - under_five_deaths                1    1710.8 40667 7227.6
## - schooling                        1    4022.3 42979 7372.7
## - adult_mortality                  1    6968.0 45924 7546.7
## - hiv_aids                         1   11181.9 50138 7777.0
## 
## Step:  AIC=7114.9
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - population                       1       2.7 38959 7113.1
## - alcohol                          1      27.0 38984 7114.7
## <none>                                         38957 7114.9
## - total_expenditure                1      40.1 38997 7115.6
## - measles                          1      60.7 39018 7117.0
## - percentage_expenditure           1      67.4 39024 7117.4
## - gdp                              1      97.3 39054 7119.4
## - thinness_1_19_years              1     160.3 39117 7123.7
## - status                           1     283.0 39240 7131.9
## - polio                            1     439.5 39396 7142.3
## - diphtheria                       1     722.4 39679 7161.1
## - bmi                              1     789.9 39747 7165.6
## - income_composition_of_resources  1    1410.0 40367 7206.2
## - infant_deaths                    1    1651.6 40608 7221.9
## - under_five_deaths                1    1716.1 40673 7226.0
## - schooling                        1    4036.0 42993 7371.6
## - adult_mortality                  1    6969.2 45926 7544.8
## - hiv_aids                         1   11193.2 50150 7775.6
## 
## Step:  AIC=7113.09
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - alcohol                          1      27.2 38987 7112.9
## <none>                                         38959 7113.1
## - total_expenditure                1      39.7 38999 7113.8
## - measles                          1      63.5 39023 7115.4
## - percentage_expenditure           1      67.3 39027 7115.6
## - gdp                              1      97.5 39057 7117.6
## - thinness_1_19_years              1     160.6 39120 7121.9
## - status                           1     281.8 39241 7130.0
## - polio                            1     438.6 39398 7140.5
## - diphtheria                       1     725.7 39685 7159.5
## - bmi                              1     791.4 39751 7163.9
## - income_composition_of_resources  1    1409.6 40369 7204.3
## - infant_deaths                    1    1713.3 40673 7224.0
## - under_five_deaths                1    1744.0 40703 7226.0
## - schooling                        1    4048.6 43008 7370.5
## - adult_mortality                  1    6976.5 45936 7543.3
## - hiv_aids                         1   11193.5 50153 7773.8
## 
## Step:  AIC=7112.92
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## <none>                                         38987 7112.9
## - total_expenditure                1      45.6 39032 7114.0
## - measles                          1      61.9 39049 7115.1
## - percentage_expenditure           1      71.4 39058 7115.7
## - gdp                              1      93.2 39080 7117.2
## - thinness_1_19_years              1     197.5 39184 7124.2
## - status                           1     416.7 39403 7138.8
## - polio                            1     443.1 39430 7140.6
## - diphtheria                       1     729.0 39716 7159.5
## - bmi                              1     790.0 39777 7163.6
## - income_composition_of_resources  1    1415.8 40403 7204.5
## - infant_deaths                    1    1686.2 40673 7222.0
## - under_five_deaths                1    1717.2 40704 7224.0
## - schooling                        1    4406.1 43393 7391.9
## - adult_mortality                  1    6954.9 45942 7541.7
## - hiv_aids                         1   11179.4 50166 7772.5
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling, 
##     data = subset(life_clean, select = -c(year, country, continent, 
##         region)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.5984  -2.3398  -0.1235   2.2619  17.5405 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.524e+01  6.431e-01  85.891  < 2e-16 ***
## statusDeveloping                -1.346e+00  2.550e-01  -5.280 1.40e-07 ***
## adult_mortality                 -1.766e-02  8.189e-04 -21.570  < 2e-16 ***
## infant_deaths                    8.870e-02  8.351e-03  10.621  < 2e-16 ***
## percentage_expenditure           1.649e-04  7.544e-05   2.186 0.028931 *  
## measles                         -1.548e-05  7.605e-06  -2.036 0.041890 *  
## bmi                              3.646e-02  5.016e-03   7.270 4.74e-13 ***
## under_five_deaths               -6.602e-02  6.160e-03 -10.718  < 2e-16 ***
## polio                            2.482e-02  4.559e-03   5.445 5.68e-08 ***
## total_expenditure                5.908e-02  3.383e-02   1.746 0.080870 .  
## diphtheria                       3.168e-02  4.536e-03   6.983 3.64e-12 ***
## hiv_aids                        -4.802e-01  1.756e-02 -27.347  < 2e-16 ***
## gdp                              2.909e-05  1.165e-05   2.496 0.012609 *  
## thinness_1_19_years             -8.538e-02  2.349e-02  -3.635 0.000283 ***
## income_composition_of_resources  6.227e+00  6.399e-01   9.732  < 2e-16 ***
## schooling                        7.365e-01  4.290e-02  17.168  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.866 on 2608 degrees of freedom
## Multiple R-squared:  0.8305, Adjusted R-squared:  0.8295 
## F-statistic: 851.7 on 15 and 2608 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     +bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     income_composition_of_resources + schooling, data = life_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.7994  -2.2965  -0.1041   2.2529  18.5435 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     55.0840506  0.5905404  93.277  < 2e-16 ***
## statusDeveloping                -2.1741396  0.2398023  -9.066  < 2e-16 ***
## adult_mortality                 -0.0179869  0.0008269 -21.752  < 2e-16 ***
## infant_deaths                    0.0846174  0.0084069  10.065  < 2e-16 ***
## bmi                              0.0446784  0.0047838   9.339  < 2e-16 ***
## under_five_deaths               -0.0644595  0.0061863 -10.420  < 2e-16 ***
## polio                            0.0248476  0.0046206   5.378 8.22e-08 ***
## diphtheria                       0.0314429  0.0045996   6.836 1.01e-11 ***
## hiv_aids                        -0.4799068  0.0176931 -27.124  < 2e-16 ***
## income_composition_of_resources  6.8176464  0.6401068  10.651  < 2e-16 ***
## schooling                        0.7794565  0.0429254  18.158  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.924 on 2613 degrees of freedom
## Multiple R-squared:  0.825,  Adjusted R-squared:  0.8244 
## F-statistic:  1232 on 10 and 2613 DF,  p-value: < 2.2e-16

## [1] 2525
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     +bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     income_composition_of_resources + schooling, data = life_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3121  -2.2247  -0.1143   2.1593  17.4907 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     57.2772489  0.6003860  95.401  < 2e-16 ***
## statusDeveloping                -2.2759834  0.2309374  -9.855  < 2e-16 ***
## adult_mortality                 -0.0194412  0.0009306 -20.891  < 2e-16 ***
## infant_deaths                    0.0693023  0.0098036   7.069 2.01e-12 ***
## bmi                              0.0389316  0.0046108   8.444  < 2e-16 ***
## under_five_deaths               -0.0533229  0.0072975  -7.307 3.65e-13 ***
## polio                            0.0220567  0.0045025   4.899 1.03e-06 ***
## diphtheria                       0.0261304  0.0044883   5.822 6.56e-09 ***
## hiv_aids                        -0.5626863  0.0278559 -20.200  < 2e-16 ***
## income_composition_of_resources  6.4577600  0.6205748  10.406  < 2e-16 ***
## schooling                        0.7347332  0.0419489  17.515  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.757 on 2514 degrees of freedom
## Multiple R-squared:  0.8036, Adjusted R-squared:  0.8028 
## F-statistic:  1029 on 10 and 2514 DF,  p-value: < 2.2e-16

## Start:  AIC=6602.45
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - hepatitis_b                      1       0.3 33962 6600.5
## - population                       1       1.6 33963 6600.6
## - thinness_5_9_years               1       3.1 33965 6600.7
## <none>                                         33962 6602.4
## - thinness_1_19_years              1      39.9 34001 6603.4
## - total_expenditure                1      59.0 34021 6604.8
## - alcohol                          1      63.0 34025 6605.1
## - percentage_expenditure           1      71.7 34033 6605.8
## - measles                          1      77.2 34039 6606.2
## - gdp                              1      97.1 34059 6607.7
## - status                           1     266.3 34228 6620.2
## - polio                            1     316.2 34278 6623.8
## - diphtheria                       1     376.3 34338 6628.3
## - bmi                              1     458.3 34420 6634.3
## - infant_deaths                    1     945.7 34907 6669.8
## - under_five_deaths                1     979.2 34941 6672.2
## - income_composition_of_resources  1    1254.7 35216 6692.0
## - schooling                        1    3146.9 37108 6824.2
## - hiv_aids                         1    5616.3 39578 6986.9
## - adult_mortality                  1    5805.6 39767 6998.9
## 
## Step:  AIC=6600.47
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     population + thinness_1_19_years + thinness_5_9_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - population                       1       1.6 33963 6598.6
## - thinness_5_9_years               1       3.1 33965 6598.7
## <none>                                         33962 6600.5
## - thinness_1_19_years              1      39.7 34002 6601.4
## - total_expenditure                1      59.3 34021 6602.9
## - alcohol                          1      62.7 34025 6603.1
## - percentage_expenditure           1      71.4 34033 6603.8
## - measles                          1      77.1 34039 6604.2
## - gdp                              1      97.6 34060 6605.7
## - status                           1     267.0 34229 6618.2
## - polio                            1     327.5 34289 6622.7
## - bmi                              1     458.7 34421 6632.3
## - diphtheria                       1     462.1 34424 6632.6
## - infant_deaths                    1     945.7 34908 6667.8
## - under_five_deaths                1     979.0 34941 6670.2
## - income_composition_of_resources  1    1254.6 35217 6690.1
## - schooling                        1    3170.3 37132 6823.8
## - hiv_aids                         1    5618.1 39580 6985.0
## - adult_mortality                  1    5805.4 39767 6996.9
## 
## Step:  AIC=6598.59
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + thinness_5_9_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - thinness_5_9_years               1       3.3 33967 6596.8
## <none>                                         33963 6598.6
## - thinness_1_19_years              1      39.3 34003 6599.5
## - total_expenditure                1      58.8 34022 6601.0
## - alcohol                          1      63.0 34026 6601.3
## - percentage_expenditure           1      71.3 34035 6601.9
## - measles                          1      80.0 34043 6602.5
## - gdp                              1      97.8 34061 6603.8
## - status                           1     266.1 34230 6616.3
## - polio                            1     326.9 34290 6620.8
## - bmi                              1     459.2 34423 6630.5
## - diphtheria                       1     464.0 34427 6630.8
## - infant_deaths                    1     972.5 34936 6667.9
## - under_five_deaths                1     991.5 34955 6669.2
## - income_composition_of_resources  1    1254.5 35218 6688.2
## - schooling                        1    3179.1 37143 6822.5
## - hiv_aids                         1    5617.8 39581 6983.1
## - adult_mortality                  1    5809.1 39773 6995.3
## 
## Step:  AIC=6596.83
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## <none>                                         33967 6596.8
## - total_expenditure                1      60.3 34027 6599.3
## - alcohol                          1      64.2 34031 6599.6
## - percentage_expenditure           1      71.4 34038 6600.1
## - measles                          1      78.5 34045 6600.7
## - gdp                              1      98.2 34065 6602.1
## - thinness_1_19_years              1     245.9 34213 6613.1
## - status                           1     267.0 34234 6614.6
## - polio                            1     328.3 34295 6619.1
## - diphtheria                       1     462.5 34429 6629.0
## - bmi                              1     476.6 34443 6630.0
## - infant_deaths                    1     969.4 34936 6665.9
## - under_five_deaths                1     988.2 34955 6667.2
## - income_composition_of_resources  1    1253.2 35220 6686.3
## - schooling                        1    3177.0 37144 6820.6
## - hiv_aids                         1    5650.2 39617 6983.4
## - adult_mortality                  1    5825.7 39792 6994.5
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + total_expenditure + diphtheria + hiv_aids + gdp + 
##     thinness_1_19_years + income_composition_of_resources + schooling, 
##     data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4795  -2.2931  -0.1227   2.1553  15.8087 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.753e+01  6.560e-01  87.700  < 2e-16 ***
## statusDeveloping                -1.176e+00  2.650e-01  -4.440 9.38e-06 ***
## adult_mortality                 -1.905e-02  9.186e-04 -20.740  < 2e-16 ***
## infant_deaths                    8.351e-02  9.871e-03   8.460  < 2e-16 ***
## alcohol                          5.539e-02  2.545e-02   2.176  0.02962 *  
## percentage_expenditure           1.651e-04  7.191e-05   2.296  0.02178 *  
## measles                         -2.092e-05  8.690e-06  -2.407  0.01615 *  
## bmi                              2.858e-02  4.817e-03   5.932 3.40e-09 ***
## under_five_deaths               -6.225e-02  7.287e-03  -8.542  < 2e-16 ***
## polio                            2.176e-02  4.419e-03   4.923 9.06e-07 ***
## total_expenditure                6.932e-02  3.285e-02   2.110  0.03494 *  
## diphtheria                       2.573e-02  4.403e-03   5.844 5.76e-09 ***
## hiv_aids                        -5.670e-01  2.776e-02 -20.425  < 2e-16 ***
## gdp                              2.992e-05  1.111e-05   2.693  0.00713 ** 
## thinness_1_19_years             -1.003e-01  2.353e-02  -4.261 2.11e-05 ***
## income_composition_of_resources  5.929e+00  6.163e-01   9.619  < 2e-16 ***
## schooling                        6.548e-01  4.275e-02  15.316  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.68 on 2508 degrees of freedom
## Multiple R-squared:  0.812,  Adjusted R-squared:  0.8108 
## F-statistic:   677 on 16 and 2508 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     bmi + under_five_deaths + total_expenditure + diphtheria + 
##     hiv_aids + gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling, data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4482  -2.2757  -0.1305   2.1693  16.6982 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.813e+01  6.432e-01  90.373  < 2e-16 ***
## statusDeveloping                -1.469e+00  2.441e-01  -6.015 2.06e-09 ***
## adult_mortality                 -1.893e-02  9.212e-04 -20.545  < 2e-16 ***
## infant_deaths                    7.881e-02  9.713e-03   8.114 7.55e-16 ***
## bmi                              2.921e-02  4.824e-03   6.055 1.61e-09 ***
## under_five_deaths               -5.938e-02  7.214e-03  -8.231 2.95e-16 ***
## total_expenditure                8.162e-02  3.293e-02   2.478   0.0133 *  
## diphtheria                       3.854e-02  3.612e-03  10.670  < 2e-16 ***
## hiv_aids                        -5.681e-01  2.771e-02 -20.501  < 2e-16 ***
## gdp                              5.136e-05  6.387e-06   8.041 1.36e-15 ***
## thinness_1_19_years             -1.070e-01  2.302e-02  -4.646 3.56e-06 ***
## income_composition_of_resources  5.856e+00  6.201e-01   9.444  < 2e-16 ***
## schooling                        6.996e-01  4.184e-02  16.720  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.707 on 2512 degrees of freedom
## Multiple R-squared:  0.809,  Adjusted R-squared:  0.8081 
## F-statistic: 886.5 on 12 and 2512 DF,  p-value: < 2.2e-16

## [1] 3.835798
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(bmi) + log1p(infant_deaths) + total_expenditure + diphtheria + 
##     hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + 
##     schooling, data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.1168  -2.5378  -0.1681   2.3135  15.3633 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     57.91524    0.93765  61.766  < 2e-16 ***
## statusDeveloping                -1.14888    0.24814  -4.630 3.84e-06 ***
## log1p(adult_mortality)          -0.86561    0.08429 -10.269  < 2e-16 ***
## log1p(bmi)                       0.24714    0.11845   2.086   0.0370 *  
## log1p(infant_deaths)            -0.68126    0.05959 -11.433  < 2e-16 ***
## total_expenditure                0.06821    0.03454   1.975   0.0484 *  
## diphtheria                       0.04212    0.00373  11.291  < 2e-16 ***
## hiv_aids                        -0.75770    0.02633 -28.776  < 2e-16 ***
## log1p(gdp)                       0.47854    0.05400   8.863  < 2e-16 ***
## log1p(thinness_1_19_years)      -1.04665    0.14609  -7.164 1.02e-12 ***
## income_composition_of_resources  7.86418    0.64373  12.217  < 2e-16 ***
## schooling                        0.61234    0.04480  13.667  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.85 on 2513 degrees of freedom
## Multiple R-squared:  0.7938, Adjusted R-squared:  0.7929 
## F-statistic: 879.7 on 11 and 2513 DF,  p-value: < 2.2e-16

## [1] 3.985237
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
##  studentized Breusch-Pagan test
## 
## data:  small_cleaner_backward_log
## BP = 218.74, df = 11, p-value < 2.2e-16
## 
## Call:
## lm(formula = log1p(life_expectancy) ~ status + log1p(adult_mortality) + 
##     log1p(bmi) + log1p(under_five_deaths) + total_expenditure + 
##     diphtheria + hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) + 
##     income_composition_of_resources + schooling, data = life_clean1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.219788 -0.034180 -0.000628  0.033872  0.219947 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      4.072e+00  1.365e-02 298.391  < 2e-16 ***
## statusDeveloping                -8.827e-03  3.589e-03  -2.459   0.0140 *  
## log1p(adult_mortality)          -1.200e-02  1.220e-03  -9.829  < 2e-16 ***
## log1p(bmi)                       4.242e-03  1.715e-03   2.473   0.0135 *  
## log1p(under_five_deaths)        -1.106e-02  8.244e-04 -13.419  < 2e-16 ***
## total_expenditure                8.463e-04  4.999e-04   1.693   0.0906 .  
## diphtheria                       6.518e-04  5.403e-05  12.065  < 2e-16 ***
## hiv_aids                        -1.196e-02  3.817e-04 -31.339  < 2e-16 ***
## log1p(gdp)                       6.422e-03  7.822e-04   8.210 3.50e-16 ***
## log1p(thinness_1_19_years)      -1.281e-02  2.115e-03  -6.058 1.58e-09 ***
## income_composition_of_resources  1.154e-01  9.316e-03  12.389  < 2e-16 ***
## schooling                        8.779e-03  6.506e-04  13.493  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05572 on 2513 degrees of freedom
## Multiple R-squared:  0.7984, Adjusted R-squared:  0.7975 
## F-statistic: 904.7 on 11 and 2513 DF,  p-value: < 2.2e-16

## [1] 65.102

## [1] 1.545455
## Warning: package 'pracma' was built under R version 4.0.2
## 
## Attaching package: 'pracma'
## The following object is masked from 'package:purrr':
## 
##     cross
## 
## Call:
## lm(formula = life_expectancy^2 ~ status + log1p(adult_mortality) + 
##     log1p(bmi) + log1p(under_five_deaths) + total_expenditure + 
##     diphtheria + hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) + 
##     income_composition_of_resources + schooling, data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1534.41  -353.56   -38.63   316.95  2135.98 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     3535.6599   130.7043  27.051  < 2e-16 ***
## statusDeveloping                -243.9356    34.3816  -7.095 1.68e-12 ***
## log1p(adult_mortality)          -120.5440    11.6899 -10.312  < 2e-16 ***
## log1p(bmi)                        20.6667    16.4297   1.258   0.2086    
## log1p(under_five_deaths)         -91.7842     7.8964 -11.624  < 2e-16 ***
## total_expenditure                 10.7517     4.7883   2.245   0.0248 *  
## diphtheria                         5.1823     0.5175  10.014  < 2e-16 ***
## hiv_aids                         -92.8286     3.6563 -25.389  < 2e-16 ***
## log1p(gdp)                        66.9525     7.4928   8.936  < 2e-16 ***
## log1p(thinness_1_19_years)      -156.7699    20.2606  -7.738 1.46e-14 ***
## income_composition_of_resources 1087.7867    89.2291  12.191  < 2e-16 ***
## schooling                         79.7725     6.2318  12.801  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 533.7 on 2513 degrees of freedom
## Multiple R-squared:  0.7887, Adjusted R-squared:  0.7878 
## F-statistic: 852.7 on 11 and 2513 DF,  p-value: < 2.2e-16

## [1] 0
## [1] 4.246683

## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     I(bmi^2) * status + log1p(under_five_deaths) + log1p(total_expenditure) + 
##     diphtheria + I(diphtheria^2) + hiv_aids + I(hiv_aids^2) + 
##     log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + I(income_composition_of_resources^3) + 
##     schooling + I(schooling^2) + I(schooling^3), data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.3279  -2.0054  -0.1119   1.8930  13.7345 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           6.770e+01  1.070e+00  63.282  < 2e-16 ***
## statusDeveloping                     -5.138e-01  4.934e-01  -1.041  0.29775    
## log1p(adult_mortality)               -6.216e-01  7.243e-02  -8.582  < 2e-16 ***
## I(bmi^2)                             -2.024e-04  1.247e-04  -1.623  0.10464    
## log1p(under_five_deaths)             -2.535e-01  5.186e-02  -4.889 1.08e-06 ***
## log1p(total_expenditure)              8.234e-01  1.851e-01   4.450 8.98e-06 ***
## diphtheria                           -4.274e-02  1.354e-02  -3.157  0.00161 ** 
## I(diphtheria^2)                       6.426e-04  1.210e-04   5.311 1.19e-07 ***
## hiv_aids                             -1.298e+00  5.243e-02 -24.764  < 2e-16 ***
## I(hiv_aids^2)                         3.105e-02  2.186e-03  14.207  < 2e-16 ***
## log1p(gdp)                            9.685e-02  4.811e-02   2.013  0.04423 *  
## log1p(thinness_1_19_years)           -7.074e-01  1.333e-01  -5.306 1.22e-07 ***
## income_composition_of_resources      -4.477e+01  4.172e+00 -10.731  < 2e-16 ***
## I(income_composition_of_resources^2)  1.052e+02  1.101e+01   9.558  < 2e-16 ***
## I(income_composition_of_resources^3) -4.710e+01  7.454e+00  -6.318 3.13e-10 ***
## schooling                             2.354e-01  2.481e-01   0.949  0.34277    
## I(schooling^2)                       -2.845e-04  2.799e-02  -0.010  0.99189    
## I(schooling^3)                       -3.337e-04  8.956e-04  -0.373  0.70946    
## statusDeveloping:I(bmi^2)             1.057e-04  1.435e-04   0.736  0.46156    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.269 on 2506 degrees of freedom
## Multiple R-squared:  0.8517, Adjusted R-squared:  0.8507 
## F-statistic: 799.8 on 18 and 2506 DF,  p-value: < 2.2e-16

## [1] 3.733912
## 
##  studentized Breusch-Pagan test
## 
## data:  small_cleaner_backward_log_poly
## BP = 179.56, df = 18, p-value < 2.2e-16
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     I(bmi^2) * status + log1p(infant_deaths) + I(infant_deaths^2) + 
##     log1p(total_expenditure) + I(gdp^2) + diphtheria + I(diphtheria^2) + 
##     hiv_aids + I(hiv_aids^2) + log1p(gdp) + log1p(thinness_1_19_years) + 
##     income_composition_of_resources + I(income_composition_of_resources^2) + 
##     schooling + I(schooling^2), data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3087  -2.0397  -0.1989   1.9317  14.0502 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           6.523e+01  1.005e+00  64.897  < 2e-16 ***
## statusDeveloping                     -1.213e-01  4.967e-01  -0.244  0.80706    
## log1p(adult_mortality)               -5.972e-01  7.331e-02  -8.146 5.85e-16 ***
## I(bmi^2)                             -2.028e-04  1.261e-04  -1.608  0.10790    
## log1p(infant_deaths)                 -3.246e-01  5.559e-02  -5.839 5.91e-09 ***
## I(infant_deaths^2)                    1.264e-06  4.590e-07   2.755  0.00592 ** 
## log1p(total_expenditure)              7.334e-01  1.872e-01   3.919 9.15e-05 ***
## I(gdp^2)                             -1.190e-10  8.685e-11  -1.370  0.17074    
## diphtheria                           -5.497e-02  1.372e-02  -4.006 6.36e-05 ***
## I(diphtheria^2)                       7.948e-04  1.222e-04   6.502 9.53e-11 ***
## hiv_aids                             -1.373e+00  5.191e-02 -26.456  < 2e-16 ***
## I(hiv_aids^2)                         3.386e-02  2.181e-03  15.528  < 2e-16 ***
## log1p(gdp)                            1.465e-01  5.066e-02   2.892  0.00386 ** 
## log1p(thinness_1_19_years)           -7.924e-01  1.367e-01  -5.798 7.54e-09 ***
## income_composition_of_resources      -2.288e+01  1.661e+00 -13.777  < 2e-16 ***
## I(income_composition_of_resources^2)  4.003e+01  2.073e+00  19.304  < 2e-16 ***
## schooling                             6.079e-01  1.070e-01   5.683 1.48e-08 ***
## I(schooling^2)                       -2.308e-02  5.271e-03  -4.379 1.24e-05 ***
## statusDeveloping:I(bmi^2)             1.988e-04  1.444e-04   1.376  0.16891    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.312 on 2506 degrees of freedom
## Multiple R-squared:  0.8478, Adjusted R-squared:  0.8467 
## F-statistic: 775.8 on 18 and 2506 DF,  p-value: < 2.2e-16

## [1] 3.828813
## 
##  studentized Breusch-Pagan test
## 
## data:  small_cleaner_backward_log_poly
## BP = 199.8, df = 18, p-value < 2.2e-16

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in predict.lm(small_cleaner_backward_log_poly, newdata = le_tst_data):
## prediction from a rank-deficient fit may be misleading
## [1] 3.286007
## 
##  studentized Breusch-Pagan test
## 
## data:  small_cleaner_backward_log_poly
## BP = 342.24, df = 162, p-value = 5.792e-15
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = life_expectancy_class ~ status + log1p(adult_mortality) + 
##     I(bmi^2) * status + log1p(under_five_deaths) + log1p(total_expenditure) + 
##     diphtheria + I(diphtheria^2) + hiv_aids + I(hiv_aids^2) + 
##     log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     family = "binomial", data = life_clean_glm)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.03606   0.00003   0.00061   0.00929   0.73038  
## 
## Coefficients:
##                                        Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                           5.201e+01  4.412e+03   0.012   0.9906  
## statusDeveloping                     -3.768e+00  4.412e+03  -0.001   0.9993  
## log1p(adult_mortality)               -6.510e-01  6.770e-01  -0.962   0.3362  
## I(bmi^2)                             -3.904e-05  1.386e+00   0.000   1.0000  
## log1p(under_five_deaths)             -1.065e+00  5.876e-01  -1.813   0.0698 .
## log1p(total_expenditure)             -2.014e+00  1.921e+00  -1.048   0.2945  
## diphtheria                           -7.891e-01  5.971e-01  -1.321   0.1863  
## I(diphtheria^2)                       5.199e-03  3.906e-03   1.331   0.1832  
## hiv_aids                             -3.548e-01  3.261e-01  -1.088   0.2766  
## I(hiv_aids^2)                         1.579e-02  1.888e-02   0.836   0.4032  
## log1p(gdp)                           -2.296e-01  4.611e-01  -0.498   0.6185  
## log1p(thinness_1_19_years)           -8.196e-01  1.060e+00  -0.773   0.4396  
## income_composition_of_resources       1.189e-01  1.192e+01   0.010   0.9920  
## I(income_composition_of_resources^2)  2.001e+01  2.470e+01   0.810   0.4178  
## schooling                            -2.479e-01  1.255e+00  -0.198   0.8434  
## I(schooling^2)                        5.314e-04  7.500e-02   0.007   0.9943  
## statusDeveloping:I(bmi^2)            -6.544e-05  1.386e+00   0.000   1.0000  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 84.493  on 2524  degrees of freedom
## Residual deviance: 50.729  on 2508  degrees of freedom
## AIC: 84.729
## 
## Number of Fisher Scoring iterations: 21

Appendix

## 
## Call:
## lm(formula = life_expectancy ~ ., data = subset(le_trn_data, 
##     select = -c(year, continent, status, country)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.0871  -2.0703  -0.1235   1.9829  14.3539 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       5.735e+01  5.817e-01  98.591  < 2e-16 ***
## adult_mortality                  -1.490e-02  7.896e-04 -18.874  < 2e-16 ***
## infant_deaths                     4.676e-02  8.539e-03   5.476 4.76e-08 ***
## alcohol                           8.362e-02  2.716e-02   3.078 0.002102 ** 
## percentage_expenditure            2.592e-04  7.363e-05   3.521 0.000437 ***
## hepatitis_b                      -1.560e-03  3.867e-03  -0.404 0.686613    
## measles                          -1.036e-05  7.342e-06  -1.411 0.158476    
## bmi                               5.916e-03  5.226e-03   1.132 0.257779    
## under_five_deaths                -3.592e-02  6.232e-03  -5.765 9.14e-09 ***
## polio                             2.207e-02  4.421e-03   4.993 6.34e-07 ***
## total_expenditure                 3.291e-02  3.321e-02   0.991 0.321744    
## diphtheria                        2.471e-02  4.743e-03   5.209 2.05e-07 ***
## hiv_aids                         -3.662e-01  1.755e-02 -20.865  < 2e-16 ***
## gdp                               2.949e-05  1.146e-05   2.574 0.010119 *  
## population                       -2.744e-10  1.773e-09  -0.155 0.877014    
## thinness_1_19_years              -3.122e-02  4.779e-02  -0.653 0.513684    
## thinness_5_9_years               -7.835e-02  4.645e-02  -1.687 0.091780 .  
## income_composition_of_resources   6.053e+00  6.126e-01   9.881  < 2e-16 ***
## schooling                         6.489e-01  4.261e-02  15.230  < 2e-16 ***
## regionEurope & Central Asia       1.750e-01  2.845e-01   0.615 0.538452    
## regionLatin America & Caribbean   8.310e-01  2.808e-01   2.959 0.003112 ** 
## regionMiddle East & North Africa  8.858e-01  3.152e-01   2.810 0.004985 ** 
## regionNorth America               7.232e-01  7.879e-01   0.918 0.358733    
## regionSouth Asia                  9.730e-01  5.291e-01   1.839 0.066026 .  
## regionSub-Saharan Africa         -4.609e+00  3.007e-01 -15.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.689 on 2610 degrees of freedom
## Multiple R-squared:  0.8509, Adjusted R-squared:  0.8495 
## F-statistic: 620.5 on 24 and 2610 DF,  p-value: < 2.2e-16